{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PyTorch Tutorial\n",
    "MILA, November 2017\n",
    "\n",
    "By Sandeep Subramanian"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## An introduction to the PyTorch neural network library\n",
    "\n",
    "### `torch.nn` & `torch.optim`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from __future__ import print_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import torch.nn.init as init\n",
    "import torch.nn.functional as F\n",
    "from torch.autograd import Variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### torch.nn\n",
    "\n",
    "Neural networks can be constructed using the `torch.nn` package.\n",
    "\n",
    "Provides pretty much all neural network related functionalities such as :\n",
    "\n",
    "1. Linear layers - `nn.Linear`, `nn.Bilinear`\n",
    "2. Convolution Layers - `nn.Conv1d`, `nn.Conv2d`, `nn.Conv3d`, `nn.ConvTranspose2d`\n",
    "3. Nonlinearities - `nn.Sigmoid`, `nn.Tanh`, `nn.ReLU`, `nn.LeakyReLU`\n",
    "4. Pooling Layers - `nn.MaxPool1d`, `nn.AveragePool2d`\n",
    "4. Recurrent Networks - `nn.LSTM`, `nn.GRU`\n",
    "5. Normalization - `nn.BatchNorm2d`\n",
    "6. Dropout - `nn.Dropout`, `nn.Dropout2d`\n",
    "7. Embedding - `nn.Embedding`\n",
    "8. Loss Functions - `nn.MSELoss`, `nn.CrossEntropyLoss`, `nn.NLLLoss`\n",
    "\n",
    "Instances of these classes will have an `__call__` function built-in that can be used to run an input through the layer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Linear, Bilinear & Nonlinearities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Linear output size :  torch.Size([32, 20])\n",
      "Bilinear output size :  torch.Size([32, 50])\n"
     ]
    }
   ],
   "source": [
    "x = Variable(torch.randn(32, 10))\n",
    "y = Variable(torch.randn(32, 30))\n",
    "\n",
    "sigmoid = nn.Sigmoid()\n",
    "\n",
    "linear = nn.Linear(in_features=10, out_features=20, bias=True)\n",
    "output_linear = linear(x)\n",
    "print('Linear output size : ', output_linear.size())\n",
    "\n",
    "bilinear = nn.Bilinear(in1_features=10, in2_features=30, out_features=50, bias=True)\n",
    "output_bilinear = bilinear(x, y)\n",
    "print('Bilinear output size : ', output_bilinear.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convolution, BatchNorm & Pooling Layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Conv output size :  torch.Size([10, 32, 28, 28])\n",
      "Pool output size :  torch.Size([10, 32, 14, 14])\n"
     ]
    }
   ],
   "source": [
    "x = Variable(torch.randn(10, 3, 28, 28))\n",
    "\n",
    "conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding=1, bias=True)\n",
    "bn = nn.BatchNorm2d(num_features=32)\n",
    "pool = nn.MaxPool2d(kernel_size=(2, 2), stride=2)\n",
    "\n",
    "output_conv = bn(conv(x))\n",
    "outpout_pool = pool(conv(x))\n",
    "\n",
    "print('Conv output size : ', output_conv.size())\n",
    "print('Pool output size : ', outpout_pool.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Recurrent, Embedding & Dropout Layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding size :  torch.Size([5, 3, 20])\n",
      "GRU hidden states size :  torch.Size([5, 3, 100])\n",
      "GRU last hidden state size :  torch.Size([4, 5, 50])\n"
     ]
    }
   ],
   "source": [
    "inputs = [[1, 2, 3], [1, 0, 4], [1, 2, 4], [1, 4, 0], [1, 3, 3]]\n",
    "x = Variable(torch.LongTensor(inputs))\n",
    "\n",
    "embedding = nn.Embedding(num_embeddings=5, embedding_dim=20, padding_idx=1)\n",
    "drop = nn.Dropout(p=0.5)\n",
    "gru = nn.GRU(input_size=20, hidden_size=50, num_layers=2, batch_first=True, bidirectional=True, dropout=0.3)\n",
    "\n",
    "emb = drop(embedding(x))\n",
    "gru_h, gru_h_t = gru(emb)\n",
    "\n",
    "print('Embedding size : ', emb.size())\n",
    "print('GRU hidden states size : ', gru_h.size())\n",
    "print('GRU last hidden state size : ', gru_h_t.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### torch.nn.functional\n",
    "\n",
    "Using the above classes requires defining an instance of the class and then running inputs through the instance.\n",
    "\n",
    "The functional API provides users a way to use these classes in a `functional` way. Such as\n",
    "\n",
    "`import torch.nn.functional as F`\n",
    "\n",
    "1. Linear layers - `F.linear(input=x, weight=W, bias=b)`\n",
    "2. Convolution Layers - `F.conv2d(input=x, weight=W, bias=b, stride=1, padding=0, dilation=1, groups=1)`\n",
    "3. Nonlinearities - `F.sigmoid(x), F.tanh(x), F.relu(x), F.softmax(x)`\n",
    "4. Dropout - `F.dropout(x, p=0.5, training=True)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A few examples of the functional API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Conv output size :  torch.Size([10, 32, 28, 28])\n"
     ]
    }
   ],
   "source": [
    "x = Variable(torch.randn(10, 3, 28, 28))\n",
    "filters = Variable(torch.randn(32, 3, 3, 3))\n",
    "conv_out = F.relu(F.dropout(F.conv2d(input=x, weight=filters, padding=1), p=0.5, training=True))\n",
    "\n",
    "print('Conv output size : ', conv_out.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### torch.nn.init\n",
    "\n",
    "Provides a set of functions for standard weight initialization techniques\n",
    "\n",
    "`import torch.nn.init as init`\n",
    "\n",
    "1. Calculate the gain of a layer based on the activation function - `init.calculate_gain('sigmoid')`\n",
    "2. Uniform init - `init.uniform(tensor, low, high)`\n",
    "3. Xavier uniform - `init.xavier_uniform(tensor, gain=init.calculate_gain('sigmoid'))`\n",
    "4. Xavier normal - `init.xavier_normal(tensor, gain=init.calculate_gain('tanh'))`\n",
    "5. Orthogonal - `init.orthogonal(tensor, gain=init.calculate_gain('tanh'))`\n",
    "6. Kaiming normal - `init.kaiming_normal(tensor, mode='fan_in')`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initializing convolution kernels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "conv_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3), padding=1)\n",
    "for k,v in conv_layer.named_parameters():\n",
    "    if k == 'weight':\n",
    "        init.kaiming_normal(v)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### torch.optim\n",
    "\n",
    "Provides implementations of standard stochastic optimization techniques\n",
    "\n",
    "`import torch.optim as optim`\n",
    "\n",
    "    W1 = Variable(torch.randn(10, 20), requires_grad=True)\n",
    "    W2 = Variable(torch.randn(10, 20), requires_grad=True)\n",
    "\n",
    "1. SGD - `optim.SGD([W1, W2], lr=0.01, momentum=0.9, dampening=0, weight_decay=1e-2, nesterov=True)`\n",
    "2. Adam - `optim.Adam([W1, W2], lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)`\n",
    "\n",
    "#### Learning Rate Scheduling\n",
    "\n",
    "`optim.lr_scheduler`\n",
    "\n",
    "1. `optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)`\n",
    "2. `optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=True, threshold=1e-04, threshold_mode='rel', min_lr=1e-05, eps=1e-08)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### We'll look at how to use `torch.optim` in the following tutorial"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
